Goto

Collaborating Authors

 open-source tool


A Virtual Cybersecurity Department for Securing Digital Twins in Water Distribution Systems

Homaei, Mohammadhossein, Di Bartolo, Agustin, Mogollon-Gutierrez, Oscar, Morgado, Fernando Broncano, Rodriguez, Pablo Garcia

arXiv.org Artificial Intelligence

--Digital twins (DTs) help improve real-time monitoring and decision-making in water distribution systems. However, their connectivity makes them easy targets for cyberattacks such as scanning, denial-of-service (DoS), and unauthorized access. Small and medium-sized enterprises (SMEs) that manage these systems often do not have enough budget or staff to build strong cybersecurity teams. T o solve this problem, we present a Virtual Cybersecurity Department (VCD), an affordable and automated framework designed for SMEs. The VCD uses open-source tools like Zabbix for real-time monitoring, Suricata for network intrusion detection, Fail2Ban to block repeated login attempts, and simple firewall settings. T o improve threat detection, we also add a machine-learning-based IDS trained on the OD-IDS2022 dataset using an improved ensemble model. Our solution gives SMEs a practical and efficient way to secure water systems using low-cost and easy-to-manage tools.


HQColon: A Hybrid Interactive Machine Learning Pipeline for High Quality Colon Labeling and Segmentation

Finocchiaro, Martina, Stern, Ronja, Smith, Abraham George, Petersen, Jens, Erleben, Kenny, Ganz, Melanie

arXiv.org Artificial Intelligence

High-resolution colon segmentation is crucial for clinical and research applications, such as digital twins and personalized medicine. However, the leading open-source abdominal segmentation tool, TotalSegmentator, struggles with accuracy for the colon, which has a complex and variable shape, requiring time-intensive labeling. Here, we present the first fully automatic high-resolution colon segmentation method. To develop it, we first created a high resolution colon dataset using a pipeline that combines region growing with interactive machine learning to efficiently and accurately label the colon on CT colonography (CTC) images. Based on the generated dataset consisting of 435 labeled CTC images we trained an nnU-Net model for fully automatic colon segmentation. Our fully automatic model achieved an average symmetric surface distance of 0.2 mm (vs. 4.0 mm from TotalSegmentator) and a 95th percentile Hausdorff distance of 1.0 mm (vs. 18 mm from TotalSegmentator). Our segmentation accuracy substantially surpasses TotalSegmentator. We share our trained model and pipeline code, providing the first and only open-source tool for high-resolution colon segmentation. Additionally, we created a large-scale dataset of publicly available high-resolution colon labels.


Loki: An Open-Source Tool for Fact Verification

Li, Haonan, Han, Xudong, Wang, Hao, Wang, Yuxia, Wang, Minghan, Xing, Rui, Geng, Yilin, Zhai, Zenan, Nakov, Preslav, Baldwin, Timothy

arXiv.org Artificial Intelligence

We introduce Loki, an open-source tool designed to address the growing problem of misinformation. Loki adopts a human-centered approach, striking a balance between the quality of fact-checking and the cost of human involvement. It decomposes the fact-checking task into a five-step pipeline: breaking down long texts into individual claims, assessing their check-worthiness, generating queries, retrieving evidence, and verifying the claims. Instead of fully automating the claim verification process, Loki provides essential information at each step to assist human judgment, especially for general users such as journalists and content moderators. Moreover, it has been optimized for latency, robustness, and cost efficiency at a commercially usable level. Loki is released under an MIT license and is available on GitHub. We also provide a video presenting the system and its capabilities.


Open-Source Tool Based Framework for Automated Performance Evaluation of an AD Function

Becker, Daniel, Konthala, Sanath, Eckstein, Lutz

arXiv.org Artificial Intelligence

As automation in the field of automated driving (AD) progresses, ensuring the safety and functionality of AD functions (ADFs) becomes crucial. Virtual scenario-based testing has emerged as a prevalent method for evaluating these systems, allowing for a wider range of testing environments and reproducibility of results. This approach involves AD-equipped test vehicles operating within predefined scenarios to achieve specific driving objectives. To comprehensively assess the impact of road network properties on the performance of an ADF, varying parameters such as intersection angle, curvature and lane width is essential. However, covering all potential scenarios is impractical, necessitating the identification of feasible parameter ranges and automated generation of corresponding road networks for simulation. Automating the workflow of road network generation, parameter variation, simulation, and evaluation leads to a comprehensive understanding of an ADF's behavior in diverse road network conditions. This paper aims to investigate the influence of road network parameters on the performance of a prototypical ADF through virtual scenario-based testing, ultimately advocating the importance of road topology in assuring safety and reliability of ADFs.


Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

Shah, Pir Sami Ullah, Ahmad, Naveed, Beg, Mirza Omer

arXiv.org Artificial Intelligence

Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.


OpenMORE: an open-source tool for sampling-based path replanning in ROS

Tonola, Cesare, Beschi, Manuel, Faroni, Marco, Pedrocchi, Nicola

arXiv.org Artificial Intelligence

With the spread of robots in unstructured, dynamic environments, the topic of path replanning has gained importance in the robotics community. Although the number of replanning strategies has significantly increased, there is a lack of agreed-upon libraries and tools, making the use, development, and benchmarking of new algorithms arduous. This paper introduces OpenMORE, a new open-source ROS-based C++ library for sampling-based path replanning algorithms. The library builds a framework that allows for continuous replanning and collision checking of the traversed path during the execution of the robot trajectory. Users can solve replanning tasks exploiting the already available algorithms and can easily integrate new ones, leveraging the library to manage the entire execution.


Evaluating the anticipated outcomes of MRI seizure image from open-source tool- Prototype approach

Vajiram, Jayanthi, Senthil, Aishwarya, Maurya, Utkarsh

arXiv.org Artificial Intelligence

Clinical neuroscience studies are based on neuroimaging and psychiatric. These studies are very important for genetic data processing, cognitive evaluations, and follow-up. Brain imaging describes the colorful ways to either directly or laterally structure and function the neuroimaging system. Neuroradiologists do the interpretation of structural and functional imaging and clinical analysis of the Brain. Structural imaging deals with the nervous system structure and records the opinion of intracranial complaints of excrescence and injury.


CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models

Fan, Fan, Martinez, Georgia, Desilvio, Thomas, Shin, John, Chen, Yijiang, Wang, Bangchen, Ozeki, Takaya, Lafarge, Maxime W., Koelzer, Viktor H., Barisoni, Laura, Madabhushi, Anant, Viswanath, Satish E., Janowczyk, Andrew

arXiv.org Artificial Intelligence

Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.


Machine Learning Operations Offer Agility, Spur Innovation - AI Summary

#artificialintelligence

This all contributes to the bottom line: a 2021 global study by McKinsey found that companies that successfully scale AI can add as much as 20 percent to their earnings before interest and taxes (EBIT). Over the last several years, Capital One has developed MLOps best practices that apply across industries: balancing user needs, adopting a common, cloud-based technology stack and foundational platforms, leveraging open-source tools, and ensuring the right level of accessibility and governance for both data and models. To build consistent processes and workflows while satisfying both groups, David recommends meeting with the application design team and subject matter experts across a breadth of use cases. Open-source ML tools (code and programs freely available for anyone to use and adapt) are core ingredients in creating a strong cloud foundation and unified tech stack. Using existing open-source tools means the business does not need to devote precious technical resources to reinventing the wheel, quickening the pace at which teams can build and deploy models.


Train a Custom Object Detector with Detectron2 and FiftyOne

#artificialintelligence

Combine the dataset curation of FiftyOne with the model training of Detectron2 to easily train custom detection modelsImage 71df582bfb39b541 from the Open Images V6 dataset (CC-BY 2.0) visualized in FiftyOneIn recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable.In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.Together, you can use FiftyOne to curate your custom dataset, use Detectron2 to train a model on your FiftyOne dataset, then evaluate the Detectron2 model results back in FiftyOne to learn how to improve your dataset, continuing the cycle until you have a high-performing model. This post closely follows the official Detectron2 tutorial, augmenting it to show how to work with FiftyOne datasets and evaluations.Follow along in Colab!Check out this notebook to follow along with this post right in your browser.Screenshot of Colab notebook (image by author)SetupTo start, we’ll need to install FiftyOne and Detectron2.# Install FiftyOnepip install fiftyone # Install Detectron2 from Source (Other options available)python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'# (add --user if you don't have permission)# Or, to install it from a local clone:git clone https://github.com/facebookresearch/detectron2.gitpython -m pip install -e detectron2# On macOS, you may need to prepend the above commands with a few environment variables:CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ...Now let’s import FiftyOne and Detectron2 in Python.https://medium.com/media/aeed86d37435228fabf6d9c9ba9de189/hrefPrepare the DatasetIn this post, we show how to use a custom FiftyOne Dataset to train a Detectron2 model. We’ll train a license plate segmentation model from an existing model pre-trained on the COCO dataset, available in Detectron2’s model zoo.Since the COCO dataset doesn’t have a “Vehicle registration plate” category, we will be using segmentations of license plates from the Open Images v6 dataset in the FiftyOne Dataset Zoo to train the model to recognize this new category.Note: Images in the Open Images v6 dataset are under the CC-BY 2.0 license.For this example, we will just use some of the samples from the official “validation” split of the dataset. To improve model performance, we could always add in more data from the official “train” split as well but that will take longer to train so we’ll just stick to the “validation” split for this walkthrough.https://medium.com/media/199e938638b63c513645062845d0a30c/hrefSpecifying a classes when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful filtering capability of FiftyOne to easily keep only the “Vehicle registration plate” labels. We will also untag these samples as “validation” and create our own splits out of them.https://medium.com/media/752bb3531d42324afb97a185630c61a2/hrefhttps://medium.com/media/637aec3dc2829cfc944ddeba3235408f/hrefNext, we need to parse the dataset from FiftyOne’s format to Detectron2's format so that we can register it in the relevant Detectron2 catalogs for training. This is the most important code snippet to integrate FiftyOne and Detectron2.Note: In this example, we are specifically parsing the segmentations into bounding boxes and polylines. This function may require tweaks depending on the model being trained and the data it expects.https://medium.com/media/dab5dc327d07f670d088852b01d8cd08/hrefLet’s visualize some of the samples to make sure everything is being loaded properly:https://medium.com/media/f482d61d21f5dfe480845e047745fb31/hrefVisualizing Open Images V6 training dataset in FiftyOne (Image by author)Load the Model and Train!Following the official Detectron2 tutorial, we now fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the FiftyOne dataset. This will take a couple of minutes to run if using the linked Colab notebook.https://medium.com/media/a6294adcd080b451d88f5fc75646cda5/href# Look at training curves in tensorboard:tensorboard --logdir outputTensorboard training metrics visualization (Image by author)Inference & evaluation using the trained modelNow that the model is trained, we can run it on the validation split of our dataset and see how it performs! To start,